<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Secondary indexes</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="am.html" title="Chapter 3.  Access Method Operations" />
    <link rel="prev" href="am_close.html" title="Database close" />
    <link rel="next" href="am_foreign.html" title="Foreign key indexes" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 12.1.6.2</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">Secondary indexes</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="am_close.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 3.  Access Method Operations </th>
          <td width="20%" align="right"> <a accesskey="n" href="am_foreign.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="am_second"></a>Secondary indexes</h2>
          </div>
        </div>
      </div>
      <div class="toc">
        <dl>
          <dt>
            <span class="sect2">
              <a href="am_second.html#idm140526459658848">Error Handling With Secondary Indexes</a>
            </span>
          </dt>
        </dl>
      </div>
      <p>
        A secondary index, put simply, is a way to efficiently
        access records in a database (the primary) by means of some
        piece of information other than the usual (primary) key. In
        Berkeley DB, this index is simply another database whose keys
        are these pieces of information (the secondary keys), and
        whose data are the primary keys. Secondary indexes can be
        created manually by the application; there is no disadvantage,
        other than complexity, to doing so. However, when the
        secondary key can be mechanically derived from the primary key
        and datum that it points to, as is frequently the case,
        Berkeley DB can automatically and transparently manage
        secondary indexes.
    </p>
      <p>
        As an example of how secondary indexes might be used,
        consider a database containing a list of students at a
        college, each of whom has a unique student ID number. A
        typical database would use the student ID number as the key;
        however, one might also reasonably want to be able to look up
        students by last name. To do this, one would construct a
        secondary index in which the secondary key was this last
        name.
    </p>
      <p>
        In SQL, this would be done by executing something like the
        following:
    </p>
      <pre class="programlisting">CREATE TABLE students(student_id CHAR(4) NOT NULL,
    lastname CHAR(15), firstname CHAR(15), PRIMARY KEY(student_id));
CREATE INDEX lname ON students(lastname);</pre>
      <p>
        In Berkeley DB, this would work as follows (a <a class="ulink" href="second.javas" target="_top">Java API example is also
        available</a>):
    </p>
      <a id="prog_am13"></a>
      <pre class="programlisting">struct student_record {
    char student_id[4];
    char last_name[15];
    char first_name[15];
};

....

void
second()
{
    DB *dbp, *sdbp;
    int ret;
    
    /* Open/create primary */
    if ((ret = db_create(&amp;dbp, dbenv, 0)) != 0)
        handle_error(ret);
    if ((ret = dbp-&gt;open(dbp, NULL,
        "students.db", NULL, DB_BTREE, DB_CREATE, 0600)) != 0)
        handle_error(ret);
    
    /*
     * Open/create secondary.  Note that it supports duplicate data
     * items, since last names might not be unique.
     */
    if ((ret = db_create(&amp;sdbp, dbenv, 0)) != 0)
        handle_error(ret);
    if ((ret = sdbp-&gt;set_flags(sdbp, DB_DUP | DB_DUPSORT)) != 0)
        handle_error(ret);
    if ((ret = sdbp-&gt;open(sdbp, NULL,
        "lastname.db", NULL, DB_BTREE, DB_CREATE, 0600)) != 0)
        handle_error(ret);
    
    /* Associate the secondary with the primary. */
    if ((ret = dbp-&gt;associate(dbp, NULL, sdbp, getname, 0)) != 0)
        handle_error(ret);
}

/*
 * getname -- extracts a secondary key (the last name) from a primary
 *     key/data pair
 */
int
getname(DB *secondary, const DBT *pkey, const DBT *pdata, DBT *skey)
    
{
    /*
     * Since the secondary key is a simple structure member of the
     * record, we don't have to do anything fancy to return it.  If
     * we have composite keys that need to be constructed from the
     * record, rather than simply pointing into it, then the user's
     * function might need to allocate space and copy data.  In
     * this case, the DB_DBT_APPMALLOC flag should be set in the
     * secondary key DBT.
     */
    memset(skey, 0, sizeof(DBT));
    skey-&gt;data = ((struct student_record *)pdata-&gt;data)-&gt;last_name;
    skey-&gt;size = sizeof(((struct student_record *)pdata-&gt;data)-&gt;last_name);
    return (0);
}</pre>
      <p>
        From the application's perspective, putting things into the
        database works exactly as it does without a secondary index;
        one can simply insert records into the primary database. In
        SQL one would do the following:
    </p>
      <pre class="programlisting">INSERT INTO student
    VALUES ("WC42", "Churchill      ", "Winston        ");</pre>
      <p>
        and in Berkeley DB, one does:
    </p>
      <a id="prog_am14"></a>
      <pre class="programlisting">struct student_record s;
DBT data, key;

memset(&amp;key, 0, sizeof(DBT));
memset(&amp;data, 0, sizeof(DBT));
memset(&amp;s, 0, sizeof(struct student_record));
key.data = "WC42";
key.size = 4;
memcpy(&amp;s.student_id, "WC42", sizeof(s.student_id));
memcpy(&amp;s.last_name, "Churchill      ", sizeof(s.last_name));
memcpy(&amp;s.first_name, "Winston        ", sizeof(s.first_name));
data.data = &amp;s;
data.size = sizeof(s);
if ((ret = dbp-&gt;put(dbp, txn, &amp;key, &amp;data, 0)) != 0)
    handle_error(ret);
    </pre>
      <p>Internally, a record with secondary key "Churchill" is
        inserted into the secondary database (in addition to the
        insertion of "WC42" into the primary, of course).</p>
      <p>Deletes are similar. The SQL clause:</p>
      <pre class="programlisting">DELETE FROM student WHERE (student_id = "WC42");</pre>
      <p>looks like:</p>
      <a id="prog_am15"></a>
      <pre class="programlisting">DBT key;

memset(&amp;key, 0, sizeof(DBT));
key.data = "WC42";
key.size = 4;
if ((ret = dbp-&gt;del(dbp, txn, &amp;key, 0)) != 0)
    handle_error(ret);</pre>
      <p>
        Deletes can also be performed on the secondary index
        directly; a delete done this way will delete the "real" record
        in the primary as well. If the secondary supports duplicates
        and there are duplicate occurrences of the secondary key, then
        all records with that secondary key are removed from both the
        secondary index and the primary database. In SQL:
    </p>
      <pre class="programlisting">DELETE FROM lname WHERE (lastname = "Churchill      ");</pre>
      <p>
        In Berkeley DB:
    </p>
      <a id="prog_am16"></a>
      <pre class="programlisting">DBT skey;

memset(&amp;skey, 0, sizeof(DBT));
skey.data = "Churchill      ";
skey.size = 15;
if ((ret = sdbp-&gt;del(sdbp, txn, &amp;skey, 0)) != 0)
    handle_error(ret);</pre>
      <p>
        Gets on a secondary automatically return the primary datum.
        If <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;pget()</a> or <a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;pget()</a> is used in lieu of <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;get()</a> or
        <a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;get()</a>, the primary key is returned as well. Thus, the
        equivalent of:
    </p>
      <pre class="programlisting">SELECT * from lname WHERE (lastname = "Churchill      ");</pre>
      <p>
        would be:
    </p>
      <a id="prog_am17"></a>
      <pre class="programlisting">DBT data, pkey, skey;

memset(&amp;skey, 0, sizeof(DBT));
memset(&amp;pkey, 0, sizeof(DBT));
memset(&amp;data, 0, sizeof(DBT));
skey.data = "Churchill      ";
skey.size = 15;
if ((ret = sdbp-&gt;pget(sdbp, txn, &amp;skey, &amp;pkey, &amp;data, 0)) != 0)
    handle_error(ret);
/*
 * Now pkey contains "WC42" and data contains Winston's record.
 */</pre>
      <p>
        To create a secondary index to a Berkeley DB database, open
        the database that is to become a secondary index normally,
        then pass it as the "secondary" argument to the <a href="../api_reference/C/dbassociate.html" class="olink">DB-&gt;associate()</a>
        method for some primary database.
    </p>
      <p>
        After a <a href="../api_reference/C/dbassociate.html" class="olink">DB-&gt;associate()</a> call is made, the secondary indexes
        become alternate interfaces to the primary database. All
        updates to the primary will be automatically reflected in each
        secondary index that has been associated with it. All get
        operations using the <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;get()</a> or <a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;get()</a> methods on the
        secondary index return the primary datum associated with the
        specified (or otherwise current, in the case of cursor
        operations) secondary key. The <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;pget()</a> and <a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;pget()</a> methods
        also become usable; these behave just like <a href="../api_reference/C/dbget.html" class="olink">DB-&gt;get()</a> and
        <a href="../api_reference/C/dbcget.html" class="olink">DBC-&gt;get()</a>, but return the primary key in addition to the
        primary datum, for those applications that need it as
        well.
    </p>
      <p>
        Cursor get operations on a secondary index perform as
        expected; although the data returned will by default be those
        of the primary database, a position in the secondary index is
        maintained normally, and records will appear in the order
        determined by the secondary key and the comparison function or
        other structure of the secondary database.
    </p>
      <p>
        Delete operations on a secondary index delete the item from
        the primary database and all relevant secondaries, including
        the current one.
    </p>
      <p>
        Put operations of any kind are forbidden on secondary
        indexes, as there is no way to specify a primary key for a
        newly put item. Instead, the application should use the
        <a href="../api_reference/C/dbput.html" class="olink">DB-&gt;put()</a> or <a href="../api_reference/C/dbcput.html" class="olink">DBC-&gt;put()</a> methods on the primary database.
    </p>
      <p>
        Any number of secondary indexes may be associated with a
        given primary database, up to limitations on available memory
        and the number of open file descriptors.
    </p>
      <p>
        Note that although Berkeley DB guarantees that updates made
        using any <a href="../api_reference/C/db.html" class="olink">DB</a> handle with an associated secondary will be
        reflected in the that secondary, associating each primary
        handle with all the appropriate secondaries is the
        responsibility of the application and is not enforced by
        Berkeley DB. It is generally unsafe, but not forbidden by
        Berkeley DB, to modify a database that has secondary indexes
        without having those indexes open and associated. Similarly,
        it is generally unsafe, but not forbidden, to modify a
        secondary index directly. Applications that violate these
        rules face the possibility of outdated or incorrect results if
        the secondary indexes are later used.
    </p>
      <p>
        If a secondary index becomes outdated for any reason, it
        should be discarded using the <a href="../api_reference/C/dbremove.html" class="olink">DB-&gt;remove()</a> method and a new one
        created using the <a href="../api_reference/C/dbassociate.html" class="olink">DB-&gt;associate()</a> method. If a secondary index
        is no longer needed, all of its handles should be closed using
        the <a href="../api_reference/C/dbclose.html" class="olink">DB-&gt;close()</a> method, and then the database should be removed
        using a new database handle and the <a href="../api_reference/C/dbremove.html" class="olink">DB-&gt;remove()</a> method.
    </p>
      <p>
        Closing a primary database handle automatically
        dis-associates all secondary database handles associated with
        it.
    </p>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm140526459658848"></a>Error Handling With Secondary Indexes</h3>
            </div>
          </div>
        </div>
        <p>
            An error return during a secondary update in CDS or DS
            (which requires an abort in TDS) may leave a secondary
            index inconsistent in CDS or DS. There are a few non-error
            returns:
        </p>
        <div class="itemizedlist">
          <ul type="disc">
            <li>0 </li>
            <li>DB_BUFFER_SMALL </li>
            <li>DB_NOTFOUND </li>
            <li>DB_KEYEMPTY </li>
            <li> DB_KEYEXIST </li>
          </ul>
        </div>
        <p> 
            In the case of any other error return during a
            secondary update in CDS or DS, delete the secondary
            indices, recreate them and set the <code class="literal">DB_CREATE
            flag</code> to the <code class="literal">DB-&gt;associate</code>
            method. Some examples of error returns that need to be
            handled this way are:
        </p>
        <div class="itemizedlist">
          <ul type="disc">
            <li>ENOMEM - indicating there is insufficient memory
                to return the requested item</li>
            <li>EINVAL - indicating that an invalid flag value
                or parameter is specified</li>
          </ul>
        </div>
        <p>
            Note that <code class="literal">DB_RUNRECOVERY</code> and
            <code class="literal">DB_PAGE_NOTFOUND</code> are fatal errors
            which should never occur during normal use of CDS or DS.
            If those errors are returned by Berkeley DB when running
            without transactions, check the database integrity with
            the <code class="literal">DB-&gt;verify</code> method before rebuilding
            the secondary indices.
        </p>
      </div>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="am_close.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="am.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="am_foreign.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Database close </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Foreign key indexes</td>
        </tr>
      </table>
    </div>
  </body>
</html>
